Chapter 9 Multivariate models, predictive accuracy, and handling outliers
In this chapter we’re going to consider the geometry of models with multiple continuous predictors. We’re also going to talk about the predictive accuracy of our models, which can help us understand how well our models represent our data. Finally, we’re going to talk about fitting models with t-distributed errors that can work well for data with outliers.
9.1 Data and research questions
We’re still working with the perceptual data we used in Chapters 7 and 8, described in detail at the beginning of those chapters. We load the data below:
# data setup
#################################################
library (plotly)
library (brms)
options (contrasts = c("contr.sum","cont.sum"))
# source data
url1 = "https://raw.githubusercontent.com/santiagobarreda"
url2 = "/stats-class/master/data/h95_experiment_data.csv"
h95 = read.csv (url(paste0 (url1, url2)))
# set up colors for plotting
devtools::source_url (paste0 (url1, "/stats-class/master/data/colors.R"))
# source functions
devtools::source_url (paste0 (url1, "/stats-class/master/data/functions.R"))We’re going to consider two models investigating the perception of speaker height and adultness from speech acoustics. The models include the following variables:
pheight: a continuous variable indicating perceived height in inches (or it’s scaled counterpartpheight_s).padult: a factor indicating whether subjects indicated hearing an adult (padult=a), or a child (-padult=c).pgender: a factor indicating whether subjects indicated hearing female (pgender=f) or male speaker (-pgender=m).g0: log-f0 (or it’s scaled counterpartg0_s), a continuous predictor.gbar: \(\bar{G}\), log-mean formant-frequency (or it’s scaled counterpartgbar_s), a continuous predictor.vowel: a factor with two levels indicating the vowel category of the stimulus,vowel1=a,-vowel1=i.subj: a factor indicating subject/listener (n=10).speaker: a factor indicating speaker (n=139).
To this point we’ve only discussed the fundamental frequency (f0) of vowel sounds, related to voice pitch. Although this is a simpler variable to conceive of, it turns out that it’s not really the best predictor of the perception of many salient indexical characteristics such as height, age and gender.
In my work on the perception of apparent speaker characteristics, I have repeatedly found that the formant frequencies usually influence speaker judgments more than f0. A brief explanation of the relationship between formant frequencies and size goes:
On average, taller people have longer vocal tracts.
People with longer vocal tracts produce lower resonance frequencies (i.e. formants). Think of the difference between a trumpet and a tuba (longer = lower frequency).
If we measure the formants of a vowel sound and take the average, we get a reasonable estimate of a person’s vocal tract length and height (see point 1).
Listeners appear to use information related to average formant frequency to asses the height and age of speakers.
To measure the average formant frequency, we use the average log-transformed formant frequency across both productions available for each speaker (denoted by \(\bar{G}\)). Again I won’t get into the details of why, but it turns out that using the logarithm of the frequency has certain desirable characteristics for our linear models (explained in painstaking detail here).
All of the above is just to justify the inclusion of this second acoustic predictor. In the figure below we can see that f0 is obviously correlated with perceived height. The figure includes Pearson’s correlation coefficient calculated between the two variables. This is a value between -1 and 1 indicating how much of a perfect line the two variables make when plotted together.
The correlations calculated below are not really ‘valid’ for repeated measures data like what we have. However they do help us confirm what the figures below show: It seems that \(\bar{G}\) has a more linear relationship to perceived height then f0 does. Further, the two predictors are highly correlated overall. This makes sense since across the human population, shorter people (with higher formants) generally also have higher pitches (e.g. there are no low-pitched babies).
The location of the points in the 2-dimensional plots above is defined by the value of the two variables defining the dimensions. However, each point is actually associated with 3 continuous variables (f0, \(\bar{G}\), and perceived height), meaning that each point is actually associated with a specific location in a 3-dimensional space.
Imagine you’re holding a clear plastic square containing colored points arranged like in the 2-d plots below, in three dimensions. Looking ‘through’ the sides of the square through different directions would result in you seeing arrangements just like the 2-d plots below. In the 2d plots above I can only show you static images of 3 of the faces of this cube. However the interactive plot below can be rotated such that when you look ‘down’ certain axes you will see plots that look just like the 2-d plots above it.
Figure 9.1: Each point presents a single speaker. Speakers are plotted on average acoustics or average reported height.
Figure 9.2: A 3-dimensional representation of our data. The plane or groups of points can be turned on/off by clicking on the elements in the legend.
When considered in three dimensions, rather than forming a line, these points can form a plane. For example, the points above almost look like stairs, or a bridge, going from smaller to larger perceived heights.
A Regressions model that includes two continuous predictors tries to find the ‘best’ plane that passes through your points in the three-dimensional space. The ‘best’ palne is defined as the one that minimizes the distance from each point to the surface of the plane along the axis defined by the dependent variable. Though it’s a bit more complicated than this for multilevel models, this is still basically what’s happening.
The figure above also shows the ‘best’ plane going through our points. Just like our slope coefficients changed the slope of our lines, our slope coefficients now change the slope of our planes. Since the plane has two dimensions it has two slopes: a field can be downhill away from you, but also be up/downhill left to right. Our intercept coefficients will change the intercepts of our planes, sliding entire planes up/down without changing their slopes.
9.2 Using multiple continuous predictors
We’re going to begin by modeling perceived height as a function of f0 and \(\bar{G}\), except we’re going to include all of the other complexity we’ve been building up so far, and some new predictors. Now that we can use Bayesian ANONA to interpret our models, using so many predictors isn’t as daunting as it might otherwise be.
9.2.1 Describing and fitting the model
Our model formula is going to be as below. Note that padult and pgender interact with our continuous predictors, but vowel does not. This is because I think it’s reasonable that perceived gender or adultness affects the use of acoustic cues. However, I think vowel category will probably not have such a complicated effect and so it is included only as an intercept.
pheight ~ (g0_s + gbar_s) * padult * pgender + vowel +
((g0_s + gbar_s) * padult * pgender + vowel | subj) +
(1 | speaker)
Which in plain English mains:
“We’re predicting perceived height using two continuous predictors (
g0_sandgbar_s). The slopes of these planes are allowed to vary according topadult,pgender, and the interaction of the two. Our model includes intercept shifts for these plains according topadult,pgender, the interaction of the two, andvowel. All of the aforementioned effects are included as ‘random’ by-subject effects, and the model includes random by-speaker intercepts as well.”
Below, I set up the necessary variables in our data:
# standardize log f0
h95$g0_s = (h95$g0-mean(h95$g0)) / sd(h95$g0)
# standardize log geometric mean formant frequency
h95$gbar_s = (h95$gbar-mean(h95$gbar)) / sd(h95$gbar)
# create gender and adultness variables
h95$pgender = c('m','w')[h95$pgroup %in% c('g','w')+1]
h95$padult = c('c','a')[h95$pgroup %in% c('m','w')+1]And fit our model.
library (brms)
options (contrasts = c("contr.sum","cont.sum"))
set.seed (1)
height_perception_multi =
brms::brm (pheight ~ (g0_s + gbar_s)*padult*pgender+vowel +
((g0_s + gbar_s)*padult*pgender+vowel|subj) +
(1|speaker),
data=h95, chains=4, cores=4, warmup=1000, iter = 6000, thin = 4,
control = list(adapt_delta = 0.95),
prior = c(set_prior("student_t(60, 0, 12)", class = "Intercept"),
set_prior("student_t(3, 0, 6)", class = "b"),
set_prior("student_t(3, 0, 6)", class = "sd"),
set_prior("lkj_corr_cholesky (2)", class = "cor")))
# save model
# saveRDS (height_perception_multi, '9_height_perception_multi.RDS')9.2.2 Interpreting the model
I’m not going to show the model print statement because it’s several pages long, although I suggest you have a look at it. Instead, we are going to print just the fixed effects and calculate a Bayesian ANOVA.
fixefs = fixef (height_perception_multi)
fixefs
## Estimate Est.Error Q2.5 Q97.5
## Intercept 61.4316520 0.57274665 60.3026838 62.58137427
## g0_s -1.5139547 0.34844231 -2.1950206 -0.82717675
## gbar_s -3.0226344 0.37068405 -3.7583257 -2.29727790
## padult1 3.4151184 0.52362667 2.3514839 4.46738164
## pgender1 -0.9663628 0.33552523 -1.6360612 -0.29110462
## vowel1 -0.2323974 0.06844473 -0.3676769 -0.09972511
## g0_s:padult1 0.1087725 0.24090763 -0.3737782 0.58894607
## gbar_s:padult1 0.8556476 0.28387466 0.2844266 1.43293158
## g0_s:pgender1 -0.3373060 0.19375078 -0.7131054 0.04554423
## gbar_s:pgender1 0.2663032 0.18770930 -0.1095363 0.63366910
## padult1:pgender1 0.3061490 0.32586586 -0.3224177 0.97448355
## g0_s:padult1:pgender1 -0.0951291 0.21138089 -0.5138048 0.30520675
## gbar_s:padult1:pgender1 -0.5100064 0.19986603 -0.8978581 -0.11959186
baov = banova(height_perception_multi)We then use this information to make an ANOVA plot to see what the major sources of variation are in this dataset. Both acoustic predictors have a strong influence on the perception of height. In addition, the perception of adultness matters a lot, and there appears to be a non-zero adultness by \(\bar{G}\) interaction. There also appears to be a small effect for the perception of gender, an even smaller effect for vowel category, and perhaps some other small interactions.
par (mar = c(4.2,13,1,1))
banovaplot (baov[-2,], las = 2, xlab="Effect (inches)", horizontal = FALSE,
xaxs = 'i', xlim = c(0,5))
Figure 9.3: A Bayesian ANOVA plot of our model. These sorts of plots are described in chapter 8.
In other chapters I talk about how to recover and interpret specific parameters, and all of the same methods can be applied here. Here I’m going to interpret only a subset of the possible parameter combinations, mostly as they relate to useful examples. The interpretation is going to focus on the geometry of the planes suggested by our model.
First we can consider the effect of the intercept shift on the planes created by our model if we constrain all of these to have the same slope. The figure below contains a large number of planes: an overall plane, one for adults, and one for children. The adult and child planes differ from the overall plane according to the ±padult effect. Then there are four multicolored planes: the upper and lower plains have both been split into male (upper) and female (lower) planes within each age group These differ from the adult and child planes based on the ±pgender predictor.
I didn’t make a plot of the interaction between perceived adultness and perceived gender, but if I had it would manifest as a difference in the spacing between upper and lower multicolored plane pairs!
Figure 9.4: Three dimensional representation of data with multiple regression planes overlayed. Plot elements can be turned on/off by clicking on the legend.
The planes above correspond to equations that would look something like below. Of course, the notation below is neither a model formula nor a prediction equation so it kind of doesn’t make sense. However, I think it gets the general point across: the planes above all have the same slopes and just differ in the application of the padult and pgender intercept shifts.
Left:
pheight = (g0_s + gbar_s) # overall
Middle:
pheight = (g0_s + gbar_s) + padult1 # adult
pheight = (g0_s + gbar_s) - padult1 # child
Right:
pheight = (g0_s + gbar_s) + padult1 + pgender1 # w
pheight = (g0_s + gbar_s) + padult1 - pgender1 # m
pheight = (g0_s + gbar_s) - padult1 + pgender1 # g
pheight = (g0_s + gbar_s) - padult1 - pgender1 # b
Now, we will consider different planes constrained to have the same intercept. We can see the same overall plane in grey. The figure also contains planes indicating differences in slope for each continuous predictor for children (blue) and adults (red). By comparing the child-adult slopes for a single continuous predictor, we can see the effect that slope interactions have on our planes.
Figure 9.5: Three dimensional representation of data with multiple regression planes overlayed. Plot elements can be turned on/off by clicking on the legend.
The planes above correspond to equations that would look something like below. Again, I am using this weird pseudo-formulas that I hope make sense. The main idea now is: the planes above all have the same intercept and only differ in their slopes along each dimension based on padult.
Left:
pheight = g0_s + gbar_s # overall
Middle:
pheight = g0_s + (gbar_s + gbar_s:padult1) # adult gbar
pheight = g0_s + (gbar_s - gbar_s:padult1) # child gbar
Right:
pheight = (g0_s + g0_s:padult1) + gbar_s # adult g0
pheight = (g0_s - g0_s:padult1) + gbar_s # child g0
Below, I show a final set of planes containing might be a ‘final’ model that captures most of the important variation in the data without over-complicating things (in my opinion). The planes below reflect a fixed-effects structure that reflects a model formula of:
pheight ~ g0_s + gbar_s*padult
In other words, the model predicts perceived height based on g0, gbar (\(\bar{G}\)), and perceived adultness, with an interaction between \(\bar{G}\) and perceived adultness only. Notice that our models feature both the spacing and slope differences independently conferred by the intercept and slope interactions.
Figure 9.6: Data with ‘final’ planes (top=adults, bottom = children).
9.3 Considering predictive accuracy
We haven’t really talked about the accuracy with which our models predict our data very much, but we’re going to talk about this a bit here.
First, we’re going to predict the data using our model, which is easy using the predict function provided by brms. This function generates posterior predictions (\(y_{rep}\), ‘rep’ for replicated) generated by the model, which can be thought of in several ways.
\[ \begin{equation} \begin{split} \mu = Intercept + x_1 + x_2 + ... + x_n \\ y_{rep} = \mu + \sigma_{error} \\ \end{split} \tag{9.1} \end{equation} \]
Above we see that the posterior predictions generated by our model consist of the linear prediction (\(\mu\)) combined with normally-distributed error (with a standard deviation based on our data). Below, we directly generate a normally-distributed random variable with a given mean and standard deviation parameter:
\[ \begin{equation} \begin{split} \mu = Intercept + x_1 + x_2 + ... + x_n \\ y_{rep} \sim \mathrm{Normal} (\mu,\ \sigma_{error}) \end{split} \tag{9.2} \end{equation} \]
It’s important to note that the posterior predictions contain our predicted values plus random error. This increases the variation we can expect across individual samples. However, the posterior predictions and the linear predictor (\(\mu\)) will tend to be equal when one only considers the average value of \(y_{rep}\) or \(\mu\) for a given data point, across all posterior samples.
If we look at the output of the predict function we can see that it’s a matrix with as many rows as we have observations. For each observation it provides the mean posterior prediction, and a range of credible values.
y_pred_full = predict (height_perception_multi)
str (y_pred_full)
## num [1:2780, 1:4] 53.2 53.2 51.2 53.4 63.8 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
head (y_pred_full)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 53.18602 2.719053 47.87568 58.56740
## [2,] 53.18749 2.716867 47.83934 58.48599
## [3,] 51.23023 2.701361 45.88961 56.51074
## [4,] 53.36592 2.811862 47.94704 58.86529
## [5,] 63.80231 2.813683 58.28340 69.20151
## [6,] 52.62026 2.689347 47.39278 57.85995Below we can use the predict function to investigate how accurately our model can predict our data. Using the re_formula parameter, we can make predictions using different random effects.
The reason we might want to omit random effects when making predictions is because if our model has excellent predictions based only on its random effects, it may not generalize to the population in general. So, making sure that our models can accurately predict our data even when using only the ‘population level’ effects is an important aspect of checking model predictive accuracy.
# no random effects in predictions
y_pred_no_re = predict (height_perception_multi, re_formula = NA)
# only speaker random effects in predictions
y_pred_sp = predict (height_perception_multi, re_formula = ~(1|speaker))
# only subject random effects in predictions
y_pred_su =
predict (height_perception_multi,
re_formula = ~ ((g0_s + gbar_s) * padult * pgender + vowel | subj))
# subject and speaker random effects in predictions
y_pred_full =
predict (height_perception_multi,
re_formula = ~ ((g0_s + gbar_s) * padult * pgender + vowel | subj) +
(1|speaker))I subtract our actual data from our model predictions to calculate the residuals under different conditions. The residuals are the difference between the data we observed and the predictions made by our model for each data point.
prediction_error =
data.frame (no_re = y_pred_no_re[,1] - h95$pheight,
sp = y_pred_sp[,1] - h95$pheight,
su = y_pred_su[,1] - h95$pheight,
full = y_pred_full[,1] - h95$pheight)Below you can see histograms of each of the sets of residuals from above. It seems that the subject random effects are more important that the speaker random effects for predictions, though both matter.
Figure 9.7: Histogram of prediction error for predictions with no random-effects (No RE), speaker random-effects only (SP Only), subject random-effects only (Su Only), or full random effects (Full RE).
The are many ways to consider the ‘average’ error. Two of the most useful and common are the *root-mean-squared** error (RMS) and the mean absolute error (MAE). Below I define simple functions that carry out each operation.
# mean absolute error
mae = function (x) mean (abs (x))
# root-mean-squared error
rms = function (x) sqrt (mean (x^2))The main difference between them is in how they ‘equalize’ negative and positive values. The rms function takes the squares of values, while the mae function uses the absolute value of each error. The practical effect of this difference is that RMS error (i.e., error es defined by the rms function) will give more weight to larger residuals.
For example consider the ‘average’ residual calculated for the two functions for this tiny set of residuals: c(2, -1, 0, 0, 10). The mae function returns a value of 2.6, while the rms function returns a value of 4.6. This is because in squaring the residuals, the function substantially increases the influence of large deviations. The squares of our residuals are 4, 1, 0, 0, and 100. The ratio between our largest value and smallest value has gone from 5 (10/2) to 25 (100/4) after squaring.
We can apply both functions column-wise to our residuals with the apply function.
apply (prediction_error, 2, rms)
## no_re sp su full
## 3.377059 3.283754 2.666586 2.550822
apply (prediction_error, 2, mae)
## no_re sp su full
## 2.642946 2.569479 2.038613 1.954712We can see that, as expected, the MAE is smaller than the RMS error. Which should we use? Which is better? These are just two of many ways to calculate average error or accuracy. The RMS error relates directly to our standard deviation parameter and so is useful in that sense. For many purposes however, the MAE may be closer to what people mean by ‘accurate’ (i.e., how close can we get on average?).
So is our prediction ‘good’? I don’t know. It seems to me that being able to guess perceived height to within 2-3 inches is pretty good, especially given that the underlying data had a standard deviation of 8 inches. However, whether these constitute ‘accurate’ predictions really depends on larger theoretical issues, and the situation in which the predictions are being made and evaluated.
9.4 Handling outliers: t-distributed errors
The t distribution is a lot like the normal distribution except that it has a third parameter that determines how much it differs from the normal distribution: a parameter called nu (\(\nu\)) parameter can vary continuously from 1 to \(\infty\). If you are familiar with t-tests, this parameter is the ‘degrees of freedom’ parameter in that test. When this parameter is equal to 1, the t distribution is maximally different from the normal distribution. When this parameter is equal to \(\infty\), the two distributions are identical.
Below, we see densities of three t distributions varying in \(\nu\). Each distribution has a mean of 0 and a standard deviation of 1. This means that the x-axis directly indicates standard deviations from the mean.
We see that when \(\nu\) is small, the distribution is pointy and has what are called “fat tails”. For this reason the t distribution is sometimes said to resemble a “witch’s hat”. On the other hand when \(\nu\) is large, the distribution has much less density in its tails and looks more like a normal distribution.
The real difference is in the probability with which these distributions tolerate rare events. The three distributions begin to diverge at around ±2 standard deviations, a fact which comes across more clearly in the figure on the right below. For example, the blue and red curves below seem to differ in probability by a factor of nearly 100 at x = -4. So, an event that is four standard deviations from the mean is rare, but is much more likely to arise from a t distribution with \(\nu=5\) as opposed to \(\nu=1000\).
Figure 9.8: (left) Densty of t distributions with nu parameters of 1 (red), 10 (green) and 1000 (blue). (right) Same as left plot but y axis is logarithmic. Each horizontal line indicates a 100-fold change in probabilities.
As a result of this, a t distribution with a small \(\nu\) parameter will predict roughly the same behavior as a normal distribution inside of 2 standard deviations of the mean (n the ‘bulk’ of the distribution), but be much more tolerant to outliers in the data (in the ‘tails’). In practice, this means using a model that assumes t-distributed, rather than normal, residuals will be more robust to outliers and will not be influenced as much by extreme values.
Below, I compare histograms of random data generated using the t and normal distributions. These distributions have the same standard deviation parameter (1) and mean (0). When we focus on the bulk of the distributions (±2 standard deviations from the mean), the data looks quite similar. However, when we zoom out we see that our t distribution has generated quite a few data points that would be considered extreme outliers by the normal distribution. The only way for our normal distribution to generate values so far from the mean would be to have a substantially larger standard deviation.
Figure 9.9: (top row) Histogram of 10,000 random draws from a normal distribution. Left and right columns present same numbers, just a different x-axis range. Points along bottom indicate individual observations. (bottom row) Same as top row but data is drawn from a t distribution with a nu parameter of 4.
Below we can see that the outliers present in the t-distributed data result in a 40% overestimation of the standard deviation of the distribution (1.4/1), which we know is 1. This makes it seem that the t-distributed data is much more spread out than the normally-distributed data. When we consider a more robust measure of the spread of the data (the interquartile range), the spread of the variables actually appear much more similar.
# compare standard deviations of variables
sd (x_norm)
## [1] 1.012356
sd (x_t)
## [1] 1.429296
# compare inter-quartile ranges of variables
as.numeric ( diff ( quantile (x_norm, c(.25,.75))) )
## [1] 1.351055
as.numeric ( diff ( quantile (x_t, c(.25,.75))) )
## [1] 1.473742One reason that we care about this is that our estimate of \(\sigma_{error}\) affects the uncertainty in all of our parameter estimates. As a result, outliers that increase the estimate of \(\sigma_{error}\) propagate uncertainty throughout the whole model. By using t-distributed errors we can potentially get better parameter estimates without having to omit outliers from our data.
Below we use the residuals function to collect the residuals from the model we fit above. The residuals function gives you a posterior mean and credible intervals for your residuals (just like the predict function does for your predictions). This is because you get a different residual for each of your parameter samples.
resids = residuals (height_perception_multi)
head (resids)## Estimate Est.Error Q2.5 Q97.5
## [1,] -1.82732543 0.6721898 -3.1178915 -0.4707440
## [2,] -2.46747336 0.7306226 -3.8611248 -1.0282123
## [3,] 0.95301794 0.6094575 -0.2423565 2.1408184
## [4,] 2.24925404 0.8582907 0.5669350 3.9378093
## [5,] -1.08832468 0.8496259 -2.7427952 0.5509476
## [6,] 0.07557465 0.6404586 -1.1794754 1.3211662
Below we can compare the density of the residuals with the densities of normal and t distributions with the same standard deviation. The logarithmic axis makes it easier to understand the behavior of the distributions in the tails. For instance, everything below the bottom horizontal line in the left plot (at 0.01 in the left panel and 01e-03 in the right panel) is very difficult to see without logarithmic y axes.
In the middle plot below we see that a normal distribution does a very good job of representing the residuals in the bulk of the distribution. However, our residuals have a lot more ‘density’ in values grater than 10 and less than -10 (i.e., in the ‘tails’). In the right panel we see that a t distribution (with \(\nu = 5\)) offers a closer approximation of the overall shape of the distribution, even for extreme values.
Figure 9.10: (left) Histogram of residuals. (middle) Density of residuals (blue) compared to density of a normal distribution (red). The y axis is logarithmic: each horizontal line is a 100-fold increase in the probability of an event. (right) Density of residuals (blue) compared to density of a t-distribution (green) with a nu parameter of 10.
9.4.1 Fitting and interpreting the model
Fitting a model with t-distributed errors is very simple with brms, you just need to change the family parameter to family='student'. When you do this, that tells brms that your model is now like this (for a simple regression model):
\[ \begin{equation} \begin{split} \textrm{Likelihood:} \\ y_{[i]} \sim \mathrm{student}(\nu, \mu_{[i]},\sigma_{error}) \\ \mu_{[i]} = a_{[i]} + b_{[i]} * \mathrm{x}_{[i]} \\ \end{split} \tag{9.3} \end{equation} \]
Notice that we are now drawing our data from a t distribution (which I’m calling student), and the distribution includes the \(\nu\) parameter. The default behavior for the prior for \(\nu\) is a prior of "gamma(2, 0.1) and a fixed lower bound of 1. explained here. You can get a feel for what that means by using the curve function. Below, we compare the default prior for \(\nu\) with one that more easily allows for larger parameter values.
Figure 9.11: A comparison of the densities of two gamma distributions.
We’re going to fit the model above again, this time using a t distribution for our residuals by setting family = 'student'.
library (brms)
options (contrasts = c("contr.sum","cont.sum"))
set.seed (1)
height_perception_student =
brms::brm (pheight ~ (g0_s + gbar_s) * padult * pgender + vowel +
((g0_s + gbar_s) * padult * pgender + vowel | subj) +
(1 | speaker),
data=h95, chains=4, cores=4, warmup=1000, iter = 6000, thin = 4,
control = list(adapt_delta = 0.95), family = 'student',
prior = c(set_prior("student_t(60, 0, 12)", class = "Intercept"),
set_prior("student_t(3, 0, 6)", class = "b"),
set_prior("student_t(3, 0, 6)", class = "sd"),
set_prior("lkj_corr_cholesky (2)", class = "cor")))
# save model
saveRDS (height_perception_student, '9_height_perception_student.RDS')If we inspect the print statement we will see all of the usual information, with a new line for \(\nu\) (nu) in the ’Family Specific Parameters" section. We can see that our model suggests our residuals resemble a t distribution with 7 degrees of freedom, meaning it is not really close to normal.
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 2.23 0.06 2.11 2.34 1.00 4795 4884
nu 6.87 1.03 5.19 9.12 1.00 5042 4879
Below, we compare the fixed effects estimates returned by our two models, suggesting that the choice of error distribution does not have a strong effect on our conclusions.
Figure 9.12: Comparison of fixed effects estimates for our normal model (blue) and our t model (orange).
9.5 Plot Code
################################################################################
### Figure 9.1
################################################################################
h95$group = factor (h95$group)
h95$g0_s = (h95$g0-mean(h95$g0)) / sd(h95$g0)
h95$gbar_s = (h95$gbar-mean(h95$gbar)) / sd(h95$gbar)
agg_data = aggregate (cbind ( pheight, g0,g0_s,gbar,gbar_s) ~ speaker+group,
data = h95, FUN = median)
cols2 = c(deepgreen, darkorange, skyblue, lavender)
par (mfrow = c(1,3), mar = c(4,4,1,1), oma = c(1,1,0,0))
plot (agg_data$g0, agg_data$pheight, cex =2, col = cols2[c(1:4)][agg_data$group],
xlim=c(4.4,5.9), pch=16,lwd=2,ylim = c(45,75),ylab="Height (inches)",
xlab="log f0 (Hz)")
grid()
abline(lm (pheight ~ g0, data = agg_data)$coefficients, lwd=2,lty=3)
text (5.6,72, cex=1.1, label=paste0("r = ",
round(cor(agg_data$g0,agg_data$pheight),2)))
legend (4.5,58, legend = c('b','g','m','w'), col = cols2,
pch=16,pt.cex=1.5,bty='n',cex=1.25)
plot (agg_data$gbar, agg_data$pheight, cex =2, col = cols2[c(1:4)][agg_data$group],
xlim=c(7.3,7.7), pch=16,lwd=2,ylim = c(45,75),ylab="Height (inches)",
xlab="log GBAR (Hz)")
grid()
abline(lm (pheight ~ gbar, data = agg_data)$coefficients, lwd=2,lty=3)
text (7.6,72, cex=1.1, label=paste0("r = ",
round(cor(agg_data$gbar,agg_data$pheight),2)))
plot (agg_data$gbar, agg_data$g0, cex =2, col = cols2[c(1:4)][agg_data$group],
xlim=c(7.3,7.7), pch=16,lwd=2,ylim = c(4.4,5.9),ylab="lof f0 (Hz)",
xlab="log GBAR (Hz)")
grid()
abline(lm (g0 ~ gbar, data = agg_data)$coefficients, lwd=2,lty=3)
text (7.6,4.6, cex=1.1, label=paste0("r = ",
round(cor(agg_data$g0,agg_data$gbar),2)))
################################################################################
### Figure 9.2
################################################################################
normal_g0_gbar = readRDS ('../../models/9_height_perception_multi.RDS')
fixefs = brms::fixef (normal_g0_gbar)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy
zz0 = matrix (z,n,n)
fig3 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight,
#color = ~group, colors = c(coral,yellow,deepgreen,teal),
showlegend=TRUE)
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz0, showscale = FALSE,name="Overall",
colorscale = "Portland")
fig3 <- fig3 %>% add_trace(agg_data, color = ~group, type="scatter3d",
colors = c(coral,yellow,deepgreen,teal), mode="markers")
fig3 <- fig3 %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,1),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig3
################################################################################
### Figure 9.3
################################################################################
fixefs = fixef (height_perception_multi)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy
zz0 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy + fixefs[4]
zz1 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy - fixefs[4]
zz2 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy + fixefs[4] + fixefs[5]
zz3 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy + fixefs[4] - fixefs[5]
zz4 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy - fixefs[4] + fixefs[5]
zz5 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy - fixefs[4] - fixefs[5]
zz6 = matrix (z,n,n)
#################################################################
fig3 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight,
#color = ~group, colors = c(coral,yellow,deepgreen,teal),
showlegend=TRUE)
#fig3 <- fig3 %>% add_markers()
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz0, showscale = FALSE,name="Overall",
colorscale = "Portland")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz1, showscale = FALSE,name="Adult",
colorscale = "YlGnBu")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz2, showscale = FALSE,name="Child",
colorscale = "YlGnBu")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz3, showscale = FALSE,name="w",
colorscale = "Picnic")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz4, showscale = FALSE,name="m",
colorscale = "Picnic")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz5, showscale = FALSE,name="g",
colorscale = "Picnic")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz6, showscale = FALSE,name="b",
colorscale = "Picnic")
fig3 <- fig3 %>% add_trace(agg_data, color = ~group,
colors = c(coral,yellow,deepgreen,teal), mode="markers")
fig3 <- fig3 %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,1),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig3
################################################################################
### Figure 9.4
################################################################################
fixefs = fixef (height_perception_multi)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy
zz0 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]+fixefs[8])*yy
zz1 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]-fixefs[8])*yy
zz2 = matrix (z,n,n)
z = fixefs[1] + (fixefs[2]+fixefs[8])*xx + (fixefs[3])*yy
zz3 = matrix (z,n,n)
z = fixefs[1] + (fixefs[2]-fixefs[8])*xx + (fixefs[3])*yy
zz4 = matrix (z,n,n)
#################################################################
fig3 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight,
#color = ~group, colors = c(coral,yellow,deepgreen,teal),
showlegend=TRUE)
#fig3 <- fig3 %>% add_markers()
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz0, showscale = FALSE,name="Overall",
colorscale = "Portland")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz1, showscale = FALSE,name="Adult-gbar",
colorscale = "YlGnBu")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz2, showscale = FALSE,name="Child-gbar",
colorscale = "Picnic")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz3, showscale = FALSE,name="Adult-g0",
colorscale = "YlGnBu")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz4, showscale = FALSE,name="Child-g0",
colorscale = "Picnic")
fig3 <- fig3 %>% add_trace(agg_data, color = ~group,
colors = c(coral,yellow,deepgreen,teal), mode="markers")
#fig3
fig <- subplot(fig3)
fig <- fig %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,1),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig
################################################################################
### Figure 9.5
################################################################################
fixefs = fixef (height_perception_multi)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
#################################################################
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]+fixefs[8])*yy + fixefs[4]
zz1 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]-fixefs[8])*yy - fixefs[4]
zz2 = matrix (z,n,n)
fig3 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight,
#color = ~group, colors = c(coral,yellow,deepgreen,teal),
showlegend=TRUE)
#fig3 <- fig3 %>% add_markers()
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz1, showscale = FALSE,name="Adult",
colorscale = "YlGnBu")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz2, showscale = FALSE,name="Child",
colorscale = "Picnic")
fig3 <- fig3 %>% add_trace(agg_data, color = ~group,
colors = c(coral,yellow,deepgreen,teal), mode="markers")
#fig3
fig <- subplot(fig3)
fig <- fig %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,1),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig
################################################################################
### Figure 9.6
################################################################################
par (mfrow = c(1,4), mar = c(3,1,3,1), oma = c(2,4,0,0))
hist (prediction_error[,1], breaks = 20, col = cols[2],main="No RE",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
hist (prediction_error[,2], breaks = 20, col = cols[1],main="Sp Only",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
hist (prediction_error[,3], breaks = 20, col = cols[6],main="Su Only",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
hist (prediction_error[,4], breaks = 20, col = cols[5],main="Full RE",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
mtext (side=1, outer = TRUE, text="Prediction Error (Inches)", line= .7)
mtext (side=2, outer = TRUE, text="Density", line= 2)
################################################################################
### Figure 9.7
################################################################################
par (mfrow = c(1,2), mar = c(4,4,1,1))
cols2 = c(skyblue,coral,deepgreen)
nus = c(1,10,1000)
plot (seq (-6.5,6.5,.01), dt (seq (-6.5,6.5,.01),nus[1]), xlim = c(-5.7,5.7),
col = cols2[1], lwd=3, ylim = c(0,.45), type = 'l',ylab="Density",
xlab='x')
for (i in 2:6) curve (dt (x,nus[i]), add = TRUE, col = cols2[i],lwd=3,
xlim = c(-6.5,6.5))
grid()
plot (seq (-6.5,6.5,.01), dt (seq (-6.5,6.5,.01),nus[1]), xlim = c(-5.7,5.7),
col = cols2[1], lwd=3, ylim = c(0.0000001,.45), log='y', type = 'l',
ylab="Density",xlab="x")
for (i in 2:6) curve (dt (x,nus[i]), add = TRUE, col = cols2[i],lwd=3,
xlim = c(-6.5,6.5))
grid()
################################################################################
### Figure 9.8
################################################################################
par (mfrow = c(2,2), mar = c(2,2,1,1), oma = c(2,2,0,0))
set.seed(1)
x_norm = rnorm(10000,0,1)
x_t = rt(10000,4)
hist (x_norm, breaks = 50, xlim = c(-4,4),main="",col=teal,freq=FALSE,
xlab="",ylab= "")
points (x_norm,rep(-.005, length(x_norm)),pch=16,col=teal,cex=1)
hist (x_norm, breaks = 50, xlim = c(-12,12),main="",col=teal,freq=FALSE,
xlab="",ylab= "")
points (x_norm,rep(-.005, length(x_norm)),pch=16,col=teal,cex=1)
hist (x_t, breaks = 50, xlim = c(-4,4),main="",col=lavender,freq=FALSE,
xlab="",ylab= "")
points (x_t,rep(-.005,length(x_t)),pch=16,col=lavender,cex=1)
hist (x_t, breaks = 50, xlim = c(-12,12),main="",col=lavender,freq=FALSE,
xlab="",ylab= "")
points (x_t,rep(-.005,length(x_t)),pch=16,col=lavender,cex=1)
mtext (side = 2, outer = TRUE, text = "Density", line = 1)
################################################################################
### Figure 9.9
################################################################################
par (mfrow = c(1,3), mar = c(4,4,1,1))
hist (resids[,1], breaks = 20, main="", col =skyblue, freq = FALSE,
xlab = "Residuals")
abline (h = c(.1,.01), lty = 3, v = c(seq(-15,15,5)))
plot (density (resids[,1]), log='y', main = '', ylim = c(1e-07,1),
xlim = range(resids[,1]), lwd=3,col=skyblue,xlab="Residuals")
curve (dnorm (x, 0, sd(resids[,1])), add = TRUE, xlim = c(-20,20), lwd=3,
col=coral)
grid()
plot (density (resids[,1]), log='y', main = '', ylim = c(1e-07,1),
xlim = range(resids[,1]),col=skyblue,lwd=3,xlab="Residuals")
curve_dt = dt (seq(-6,6,.1), 10)
x2 = seq(-6,6,.1) * sd(resids[,1])
lines (x2, curve_dt, lwd=3,col=deepgreen)
grid()
################################################################################
### Figure 9.10
###############################################################################
par (mfrow = c(1,2), mar = c(4,4,2,1))
curve (dgamma(x, 2, 0.1), xlim = c(0,500), main="dgamma(x, 2, 0.1)",
ylab = "Density",lwd=3,col=deepgreen)
curve (dgamma(x, 2, 0.01), xlim = c(0,500), main="dgamma(x, 2, 0.01)",
ylab = "Density",lwd=3,col=yellow)
################################################################################
### Figure 9.11
###############################################################################
par (mfrow = c(1,1), mar = c(4,4,1,1))
brmplot (fixef(height_perception_multi)[-1,], col = skyblue,xs=(1:12)-.1)
brmplot (fixef(height_perception_student)[-1,], col=darkorange,
add = TRUE,xs=(1:12)+.1, labels="")